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Abstract 

Gaze detection and head orientation are an important part of 
many advanced human- machine interaction apphcations. Many sys- 
tems have been proposed for gaze detection. Typically, they require 
some form of user cooperation and calibration. Additionally, they 
may require multiple cameras and/or restricted head positions. We 
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present a new approach for inference of both face orientation and gaze 
direction from a single image with no restrictions on the head posi- 
tion. Our algorithm is based on a face and eye model, deduced from 
anthropometric data. This approach allows us to use a single camera 
and requires no cooperation from the user. Using a single image avoids 
the complexities associated with of a multi-camera system. Evalua- 
tion tests show that our system is accurate, fast and can be used in 
a variety of applications, including ones where the user is unaware of 
the system. 



1 Introduction 

Eyes are a major way to acquire information about humans. Human atten- 
tion, intention and even desire are closely related to gaze. As such, many 
applications require gaze detection. Instances are driver attention monitor- 
ing and human-computer interface for multimedia or medical purposes. The 
large number of proposed algorithms for this task proves that no solution 
is completely satisfying. Since quoting all the works related to the subject 
is impossible, we focus on some recent and important contributions. In [3], 
one can find a good survey of some gaze detection techniques. In [5,7], one 
can find a stereo system for for gaze and face pose computation, which is 
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particularly suitable for monitoring driver vigilance. Both systems are based 
on the two cameras, one being a narrow field camera (which provides a high 
resolution image of the eyes by tracking a small area) and the second being 
a large field camera (which tracks the whole face) . Besides the computation- 
ally complex difficulties arising from multiple cameras and controlling these 
pan-tilt cameras, the system hardware quite costly. In [6], a monocular sys- 
tem is presented, which uses a personal calibration process for each user and 
does not allow large head motions. Limiting the head motion is typical for 
systems that utilize only a single camera. [6] uses a (motorized) auto-focus 
lens to estimate the distance of the face from the camera. In [9] , the eye gaze 
is computed by using the fact, that the iris contour, while being a circle in 
3D is perspectively an ellipse in the image. The drawback in this approach is 
that a high resolution image of the iris area is necessary, which severely limits 
the possible motions of the user, unless an additional wide-angle camera is 
used. 

In this paper we introduce a new approach with several advantages. The 
system is monocular, hence the difficulties associated with multiple cameras 
are avoided. The camera parameters are maintained constant in time. The 
system requires no personal calibration and the head is allowed to move freely. 
This is achieved by using a model of the face, deduced from anthropometric 
features. 
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This approach, of a mechanically simple, automatic and non-intrusive 
system, allows eye-gazing to be used in a variety of applications where eye- 
gaze detection was not an option before. For example, such a system may be 
installed in mass produced cars. With the growing concern of car accidents, 
customers and regulators are demanding safer cars. Active sensors that may 
prevent accidents are actively perused. A non-intrusive, cheaply produced, 
one-size-fits-all eye-gazing system could monitor driver vigilance at all times. 
Drowsiness and inattention can immediately generate alarms. In conjunction 
with other active sensors, e.g. radar, obstacle detection, etc. the driver can 
be warned of an unnoticed hazard outside the car. 

Psychophysical and psychological tests and experiments with uncoopera- 
tive subjects such as children and/or primates, may also benefit from such a 
static (no moving parts) system, which allows the subject to focus solely on 
the task at hand while remaining oblivious to the eye-gaze system. 

In conjunction with additional higher-level systems, a covert eye-gazing 
system may be useful in security applications. For example, monitoring the 
eye-gaze of ATM clients. In automated airport checkin counters, such a 
system may alert of suspiciously behaving individuals. 

The paper is organized as follows. In section we present the core of 
the paper, the face model that we use and how this model leads to the 
computation of the Euclidean face 3D orientation and position. Simulations 
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are presented, that show the results are robust to error in both the model 
and the measurements. Section 121 gives an overview of the system, and some 
experiments are presented. 



2 Face Model and Geometric Analysis 



2.1 Face Model 

Following the statistical data taken from [1], we assume the following model 
for a generic human face. Let A and B be the centers of the eyes, and let 
C be the middle point between the nostrils. Then we assume the following 
model: 



rf(A,C) = rf(B,C) (1) 
d{A,B) = rd{A,C) (2) 
(i(A, B) = 6.5cm (3) 



where r = 1.0833. The two first equations allow computing the orientation 
of the face, while the third equation is necessary for computing the distance 
between the camera and the face. The face model is illustrated in figure H 



Figure 1: The face model is essentially based on the fact that the triangle Eye- 
Nose Bottom-Eye is isosceles. 



2.2 3D Face Orientation 



Let M be the camera matrix. All the computations are done in the coordi- 
nate system of the camera. Therefore the camera matrix has the following 
expression: 



where K is the matrix of internal parameters [2,4]. 

Let (a, b, c) be the projection of (A, B, C) onto the image. In the equa- 
tions below, the image points a, b, c are given by their projective coordinates 
in the image plane, while the 3D points A, B, C are given by their Euclidean 
coordinates in M^. Given these notations, the projection equations are: 



M = K[I;0], 



a 



KA 



(4) 



b 



KB 



(5) 



c 



KC 



(6) 
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where ~ means equality up to a scale factor. Therefore the 3D points are 
given by the following expressions: 

A = aK-^a (7) 
B = (3K-^h (8) 
C = (9) 

where a, /3, 7 are unknown scale factor. These could also be deduced by con- 
sidering the points at infinity of the optical rays generated by the image points 
a, b, c and the camera center. These points at infinity are simply given in 
projective coordinates by: [K~^a, 0]*, [K~^b, 0]*, [K~^c, 0]*. Then the points 
A, B, C are given in projective coordinates by [aK^^a, 1]*, [/3K^-'^b, 1]*, [7K^-'^c, 1]*. 
These expressions naturally yields the equations (jZI), ((HI), (P) giving the Eu- 
clidean coordinates of the points. 

Plugging these expressions of A, B and C into the two first equations 
of the model dH) and 0, leads to two homogeneous quadratic equations in 
a,/3,7: 

/(a, A 7) = (10) 
^7(a,/3,7) = (11) 

Thus finding the points A, B and C is now reduced in finding the inter- 
section of two conies in the projective plane. Moreover since no solution is 



on the line defined by 7 = (since the nose of the user is not located at the 
camera center!), one can reduce the computation of the affine piece defined 
by 7 = 1. Hence we shall now focus our attention on the following system: 

f{a,P,l) = (12) 
g{a,f3,l) = (13) 

This system defines the intersection of two conies in the affine plane. The 
following subsection is devoted to the computation of the solutions of this 
system. 

2.3 Computing the Intersection of Conies in the AfRne 
Plane 

For sake of completeness, we shall recall shortly one way of computing the 
solutions of the system above. For more details, see [8]. Consider first two 
polynomials f,g e C[x]. The resultant gives a way to know if the two 
polynomials have a common root. Write the polynomials as follows: 

/ ~ a„x" + ... + Oix + Oo 

g = bpxP + ... + bix + bo 
The resultant of / and g is a polynomial r, which is a combination of mono- 
mials in {ai}i=i,...,n and {&j}j=i,...,p with coefficients in Z, that is r e Z[aj, bj]. 
The resultant r vanish if and only if either or bp is zero or the polynomials 
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have a common root in C The resultant can be computed as the determinant 
of a polynomial matrix. There exist several matrices whose determinant is 
equal to the resultant. The best known and simplest matrix is the so-called 
Sylvester matrix, defined as follows: 



S{f,g) 



Therefore, we have: 



an ... bp ... 



a„_i Qn ... bp-i bp ... 



r{x) = det{Syl{f,g)). 



In addition to this expression which gives a practical way to compute the 
resultant, there exists another formula of theoretical interest: 

r{x) = anbplla,f3ixl - x^), 

where x{^ are the roots of / and x^ are those of g. It can be shown that the 
resultant is a polynomial of degree np. 

An important point is that the resultant is also defined and has the same 
properties if the coefficients of the polynomials are not only numbers but also 
polynomials in another variable. Hence, consider now that f,g& C[x, y] and 



write: 



/ = an{x)y'' + ... + ai{x)y + ao{x) 
g = bp{x)yP + ... + bi{x)y + bo{x) 
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(14) 



The question is now the following: given a value Xq of x, do the two polyno- 
mials f{xo, y) and g{xQ, y) have a common root? The answer to this question 
is based on the computation of the resultant of / and g with respect to y 
(i.e. using the presentation given bvf I14|)) . This is a univariate polynomial 
in X, denoted by r{x) = res{f,g,y). 

The resultant can be used in many contexts. For our purpose, we will use 
it to compute the intersection points of two planar algebraic curves. Consider 
the curve Ci (respectively C2) defined as the set of points {x,y) which are 
roots of f{x,y) (respectively g{x,y)). We want to compute the intersection 
of Ci and C2. Algebraically, this is equivalent to compute the common roots 
of / and g. Therefore, we use the following procedure: 

• Compute the resultant r(x) = res{f,g,y) G C[x]. 

• Find the roots of r{x): Xi, ...,Xt 

• For each i = 1, ...,t, compute the common roots of /(xj, y) and g{xi, y) 
in C[y]: yn, ...,yik,. 

• The intersection of Ci and C2 is therefore: 

(xi,?/ii), (xi,yifcj, {xt,yti), ixt,ytkt)- 

In our context, the resultant r is polynomial of degree 4 and so t < 4 and 
ki < 2. To complete the picture, we just need to mention an efficient and 
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reliable way to compute the roots of a univariate polynomial. The algorithm 



that we will describe is very efficient and robust for low degree polynomials. 
Given a univariate polynomial p{x) = anX^ + ... + aix + oq, one can form the 
following matrix, called the companion matrix of p: 















Cip) 



—a^/an — ai/a„ —O'2/O'n ■■■ ~0'n-i/0'n 
A short computation shows that the characteristic polynomial of C{p) is 

equal to —-^P- Thus the roots of p are exactly the eigenvalues of C{p). This 

provides one practical way to compute the roots of a univariate polynomial. 



2.4 3D Face Orientation 

Therefore, we solve the system S defined by equations (fT^ and (fT^ using 
the approach presented above. By Bezout's theorem (or simply by looking 
at the degree of the resultant), we know that there are at most 4 complex 
solutions to this system. Experiments show that system generated by the 
image of a human face has only two real roots. The ambiguity between these 
two roots is easily handled, since one solution leads to non realistic inter eyes 
distance. Let (ao; Po) be the right solution. Then the points A, B and C are 
known up to a unique scale factor. We shall denote Aq, Bq and Co the points 
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obtained by the solution (ao,/3o)) Thus we have the following expression: 

Ao = aoK^^a (15) 
Bo = /^oK-^b (16) 
Co = K-^c (17) 

Thus we have the following relations too: A = 7A0, B = 7B0 and C = 7C0. 

The computation of 7 is done using the third model equation 0. Once 
the face points are computed, one can compute the distance between the 
user's face and the camera and so the 3D orientation of the face. Indeed the 
normal to the plane defined by A, B and C is given by: 

1^ = AB ^AC, 

where A is the cross product. 

2.5 Robustness to Errors in Model and Detection 

In order to estimate the sensitivity of this algorithm to errors in model and 
in detection, we performed several simulations. As we shall detail in subsec- 
tion we use a rather high resolution camera. Therefore in the simulation, 
we start from the following setting: 

• The focal length / = 4000 in pixels, 

• The principal point is at the image center, 
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Noise Standard Deviation 

Figure 2: Influence of the error in focal length. 
• The distance between the camera and the face is 60cm. 

The simulations are done according to the following protocol. An artifical 
face, defined by three points in space, say A, B and C, is projected onto a 
known camera. Given a parameter p, we perform a perturbation of p by 
a white gaussian noise of standard deviation a. For each value of a, we 
perform 100 random perturbations. For each value of p, that we obtain by 
this process, we compute the error in the 3D reconstruction, as the mean of 
the square errors. 

The first simulation (see figure Ej) shows that the system is very robust 
to error in the estimation of the focal length, since for a noise with standard 
deviation of 100 (in pixels), the reconstruction error is 1.2cm, meaning less 
than 1% of the distance between the camera and the user. 
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e Standard Deviation 



Figure 3: Influence of the error in inter-eyes distance. 



The next two simulation aim at measuring the influence of errors in modeL 
First, the assumed inter-eyes distance is corrupted by a Gaussian white noise 
(flgure EI). The mean value is 6.5cm as mentionned in section El For a 
standard deviation of 0.5, which represents an extreme anomaly with respect 
to the standard human morphology, the reconstruction error is about 3.3cm, 
less than 2% of the distance between the camera and the user. The influence 
of the human ratio r, as deflned in equation is also tested, by adding 
a Gaussian white noise, centered at the "universal" value 1.0833 (flgure E}. 
For a standard deviation 0.15, which represents also a very strong anomaly, 
the reconstruction is 1.75cm, just more than 1% of the distance between the 
camera and the user. 

After measuring the influence of errors in camera calibration and model, 
the next step is evaluate the sensitivity to input data perturbation. The 
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Figure 4: Influence of the error in human ratio. 




Figure 5: Influence of the error in image points. 

image points are corrupted by a Gaussian white noise (figure E}- For noise of 
10 pixels, which is a large error in detection, the reconstruction error is less 
2cm, about 1.15% of the distance between the camera and the user. 

The accuracy of the system is mainly due to the fact that the focal length 
is high (/ = 4000 in pixels). Indeed when computing the optical rays gen- 
erated by the image points, as in equations ()7|8|9|1 . we use the inverse of K, 
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which is roughly equivalent to multiplying the image points coordinates by 
1//. Hence the larger / is, the less a detection error has an impact on the 
computation. 



3 Overview Of The System 
3.1 System Architecture 

The main goal of this work was to create a non-intrusive gaze detection 
system, that would require no user cooperation while keeping the system 
complexity low. We use a high-resolution 15 fps, 1392x1040 video camera 
with a 25mm fixed-focus lens. This setup allows both a wide field of view, for 
a broad range of head positions, and high resolution images of the eyes. Since 
we can estimate the 3D head position from a single image, we can use a fixed 
focus lens instead of a motorized auto-focus lens. This makes the camera 
calibration simpler and the calibration of the internal parameters is done 
only once. The system uses an IR LED at a known position to illuminate 
the user's face. 
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3.2 System Overview 

The general flow of the system is depicted in Figure El For every new frame 
the glints, the reflection of the LED light from the eye corneas as seen by 
the camera, are detected and their corresponding pupil is found. The search 
area for the nose is then defined, and the nose bottom is found. Given the 
two glints and nose position, we can reconstruct the complete Euclidean 3D 
face position and orientation relative to the camera, using the geometric 
algorithm presented in section |2l This reconstruction gives us the exact 3D 
position of the glints and pupils. Then, for each eye, the 3D cornea center is 
computed using the knowledge of LED position, as shown in figure |H1 This 
model is similar to the eye model used in [6] . The following sub-sections 13.31 
and 13.41 will describe these stages in more detail. 

3.3 Feature Detection 
3.3.1 Glint and Pupil Detection 

The detection of the glints is done in several steps. Glints appear as very 
bright dots in the image, usually at the highest possible grayscale values. 
Using a thresholding operation on the image yields multiple candidates for 
possible glints. Examples of other sources of similar characteristics are back- 
ground lights, facial hair, teeth and eye-glasses lens and frames. We perform 
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Get New Frame 



Detect Cornea Glints 



Detect Pupils 



Detect Glint Pairs 



Detect Nose Bottom 



Calculate 3D Head 
Position and Orientation 



Calculate 3D position of: 

• Cornea Center 

• Glint 

• Pupil 



Calculate Gaze 



Figure 6: The system flow chart, showing the different stages of the process. 
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multiple filtering stages to identify the true glints. We filter these candi- 
dates by size, i.e. we select only the small dot-like ones. Next, we pair-up 
the remaining candidates and select only those glint-pairs that obey certain 
distance and angle rules and ranges. 

We next proceed to the detection of the pupils. The pupils serve two 
purposes. They are used to filter out incorrect glint pairs, and they are 
required for the calculation of the gaze direction in the later stages of the 
algorithm. Pupils appear as round or oval dark regions inside the eye and are 
very close to (or behind) the glints. We search for these dark regions around 
each of our detected glints. Glint pairs containing a glint around which no 
pupil was found are removed. This final glint filtering will usually leave us 
with the final true glint pair. Otherwise, we choose the top-most pair, as 
empirically, it was shown to be the correct one. 

3.3.2 Nose Detection 

The detection of the nose-bottom, is done by searching for dark-bright-dark 
patterns in the area just below the eyes. Indeed, the nostrils appear as dark 
blobs in the image thanks to the relative position of the camera and the face 
as shown in figure[7| The size and orientation of this search area is determined 
by the distance and orientation of the chosen glint-pair. Once dark-bright- 
dark patterns are found, we use connected component blob analysis on this 
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C3 

Camera 



Figure 7: The camera is viewing the eyes and the nostrils. 

region to identify only those dark blobs that obey certain size, shape, distance 
and relative angle rules that yield plausible nostrils. The nose bottom is 
selected as the point just between the two nostrils. 

3.4 Gaze Detection 

Given the glints and the bottom point of the nose are detected in the image, 
one can apply the geometric algorithm presented in section |21 to compute 
the 3D face orientation. As seen in subsection 12. 5t even if the glints are 
not exactly located in the center of the eye, the system returns an accurate 
answer. Then for each eye, the cornea center is computed using the knowledge 
of LED position, as shown in figure |H1 This model is similar to the one 
presented in [6]. 

The gaze line is defined as being the line joining the cornea center and 
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Camera 



Figure 8: The cornea center lies on the bisector of the angle defined by the LED, 
the glint point in 3D and the camera. Its exact location is given by the cornea 
radius, which is 77mm. 

the pupil center in 3D. The pupil center is first detected in the image and 
computed in 3D as follows. The distance between the pupil center and the 
cornea center is a known human anatomy data. It is equal to 0.45 cm. 
Consider then a sphere S centered at the cornea center, with radius equal to 
0.45 cm. The pupil center lies on the optical ray generated by its projection 
onto the image and the camera center. This ray intersects the sphere S in 
two points. The closest of these points to the camera is the pupil center. 

4 Experiments 

We show sample images produced by the system, where one can see the 
detected triangle, made of the eyes' centers and the bottom points of the 
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Figure 9: The detected triangle, eyes' centers and tlie nose bottom, togetlier with 
the gaze line. 

nose. In addition the gaze is reprojected onto the images and rendered by 
white arrows, figure IHl CHI and [TT] 

5 Discussion 

We proposed an automatic, non-intrusive eye-gaze system. It uses an anthro- 
pomorphic model of the human face to calculate the face distance, orientation 
and gaze angle, without requiring any user-specific calibration. This gener- 
ality, as seen in subsection 12.51 does not introduce large errors into the gaze 
direction computation. 
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Figure 10: The detected triangle, eyes' centers and the nose bottom, together 
with the gaze Une. 

While the benefits of a calibration-free system allow for a broad range of 
previously impossible applications, the system design allows for easy plugging 
of user-specific calibration data, which will increase the accuracy even more. 
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